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OPTIMAL  CONTROL  THROUGH 
BIOLOGICALLY-INSPIRED  PURSUIT  1 


Cheng  Shao  and  D.  Hristu-Varsakelis 

Department  of  Mechanical  Engineering  and 
Institute  for  Systems  Research 
University  of  Maryland,  College  Park,  MD  20742  USA 

Abstract:  Inspired  by  the  process  by  which  ants  gradually  optimize  their  foraging 
trails,  this  paper  investigates  the  cooperative  solution  of  a  class  of  free-final 
time,  partially-constrained  final  state  optimal  control  problems  by  a  group  of 
dynamic  systems.  A  cooperative,  pursuit-based  algorithm  is  proposed  for  finding 
optimal  solutions  by  iteratively  optimizing  an  initial  feasible  control.  The  proposed 
algorithm  requires  only  short-range,  limited  interactions  between  group  members, 
and  avoids  the  need  for  a  “global  map”  of  the  environment  on  which  the  group 
evolves.  The  performance  of  the  algorithm  is  illustrated  in  a  series  of  numerical 
experiments. 

Keywords:  Co-operative  control,  Optimization,  Agents,  Group  work,  Trajectories 


1.  INTRODUCTION 

In  recent  years,  problems  in  cooperative  control 
are  increasingly  capturing  the  attention  of  re¬ 
searchers,  fueled  by  the  development  of  decentral¬ 
ized  control  systems  with  cost  and  performance 
advantages.  The  rising  interest  in  deploying  coop¬ 
erative  systems  also  stems  from  the  fact  that  such 
systems  have  the  potential  to  perform  tasks  that 
are  not  feasible  for  individuals.  Examples  include 
remote  exploration  and  information  gathering  by 
swarms  of  small  autonomous  robots  (Brooks  and 
Flynn,  1989),  and  satellite  arrays,  to  name  a 
few.  Members  of  such  “engineered  collectives” 
usually  have  limited  sensing,  communication  and 
computing  capabilities.  This  suggests  that  each 
member  can  only  perform  relatively  simple  tasks. 
On  the  other  hand,  the  limitations  of  individuals 
can  often  be  overcome  by  cooperation,  if  one  can 
identify  an  effective  way  to  organize  the  group 
into  “more  than  the  sum  of  its  parts”.  This  is 


1  This  work  was  supported  by  the  National  Science 
Foundation  under  Grant  No.  EIA0088081  and  by  ARO 
ODDR&E  MURI01  Grant  No.  DAAD19-01-1-0465,  (Cen¬ 
ter  for  Communicating  Networked  Control  Systems, 
through  Boston  University).  Corresponding  author:  D. 
Hristu-Varsakelis,  Tel:  +1-301-405-5283,  Fax:  +1-301-314- 
9477,  e-mail:  hristu@umd.edu 


often  demonstrated  rather  impressively  -  by 
biological  collectives.  For  example,  a  school  of 
fish  can  coordinate  their  movement  in  a  tight  for¬ 
mation;  worker  honey  bees  share  information  by 
“dancing”  and  distribute  themselves  among  nec¬ 
tar  sources  in  accordance  with  the  profitability  of 
each  source;  ants  are  known  to  utilize  pheromone 
secretions  for  recruiting  nest-mates  and  for  opti¬ 
mizing  their  foraging  trails  (Camazine,  2001).  Ob¬ 
servations  of  such  activities  in  nature  have  already 
seeded  a  variety  of  research,  from  modeling  of 
animal  group  behaviors  (Camazine,  2001;  B ruck- 
stein,  1993;  Jadbabaie  et  al,  2003),  to  distrib¬ 
uted  collective  covering  and  searching  (Wagner 
et  al. ,  1999),  cooperative  estimation  (Roumeliotis 
and  Bekey,  2002)  and  biologically-motivated  opti¬ 
mization  (Dorigo  et  al,  1996). 

One  of  the  earliest  optimization  methods  in¬ 
spired  by  trail  formation  in  ants  was  presented  in 
(Bruckstein,  1993),  where  it  was  shown  that  ants 
that  “pursued”  one  another  on  M  (each  point¬ 
ing  its  velocity  vector  towards  a  predecessor)  had 
the  effect  of  producing  progressively  “straighter” 
trails.  That  idea  was  later  extended  to  path  opti¬ 
mization  problems  involving  kinematic  vehicles  in 
non-Euclidean  environments  (Hristu-Varsakelis, 
2000).  The  last  two  works  were  restricted  ex- 


clusively  to  the  “discovery”  of  geodesics,  mean¬ 
ing  that  the  autonomous  system-members  of  the 
group  had  very  simple  dynamics  with  no  drift 
terms.  This  paper  shows  that  the  earlier  work  can 
be  generalized  to  a  much  broader  class  of  optimal 
control  problems,  and  collectives  whose  members 
have  non-trivial  dynamics.  The  proposed  algo¬ 
rithm  is  based  on  “local  pursuit”  (to  use  the  term 
coined  in  (Bruckstein,  1993))  and  guides  members 
of  a  group  towards  a  solution  of  a  free  final  time, 
partially-constrained  final  state  optimal  control 
problem,  this  time  in  a  broader  and  more  intri¬ 
cate  setting.  Under  “continuous  local  pursuit”,  as 
this  iterative,  decentralized  algorithm  is  termed, 
agents  do  not  need  a  global  map  of  their  environ¬ 
ment  or  even  an  agreed-upon  common  coordinate 
system.  The  algorithm  is  most  useful  in  trajec¬ 
tory  optimization  problems  which  are  easier  to 
solve  when  boundary  conditions  are  “close”  to  one 
another  (because  of,  for  example,  the  members’ 
computational  or  sensing  limitations),  with  the 
term  “close”  taken  to  include  not  only  geograph¬ 
ical  separation  but  also  distance  on  the  manifold 
on  which  copies  of  a  dynamical  system  evolve. 

The  remainder  of  this  paper  is  organized  as  fol¬ 
lows:  Section  2  describes  the  optimal  problems  to 
be  addressed  and  proposes  an  iterative  algorithm 
that  is  appropriate  for  a  group  of  cooperative 
dynamic  systems.  Section  3  discusses  the  main 
results  concerning  the  performance  of  the  pro¬ 
posed  algorithm.  A  pair  of  illustrative  numerical 
examples  are  given  in  Section  4. 


2.  A  BIO-INSPIRED  ALGORITHM  FOR 
OPTIMAL  CONTROL 

This  paper  explores  the  solution  of  optimal  control 
problems  using  a  group  of  cooperating  “agents”. 
The  term  “agent”  refers  to  a  member  of  a  group 
of  dynamical  systems,  each  taken  to  be  a  “copy” 
of: 

Xk  =  f(xk,uk),  xk{t )  G  Rn,uk(t)  G  Cl  C  Rm  (1) 

for  k  =  0,1,2....  Physically,  each  copy  of  (1) 
could  stand  for  a  robot,  UAV  or  other  autonomous 
system.  The  problem  of  interest  is  as  follows: 


Definition  1.  Given  the  final  state  constraint  Q(x)  = 
0,  the  constraint  set  of  x  is 

Sq  =  {x\Q(x)  =  0}. 


Now  the  function  G(x)  in  (2)  can  be  replaced  by 


G(x) 


F(x)  if  x  G  SQ 
0  if  X  (f  Sq 


with  F(x)  >  0,Vx  G  Sq.  Problem  1  involves 
optimal  control  with  free  final  time,  partially- 
constrained  final  state.  Fixed  final  state  problems, 
where  Sq  is  a  single  state  (Shao  and  Hristu- 
Varsakelis,  2004),  are  special  cases  of  what  are 
considered  here. 

For  any  pair  of  fixed  states  a,  b  G  D  C  M",  suppose 
the  optimal  trajectory  from  a  to  b  with  free  final 
time  (minimizing  J  with  respect  to  i,  T  only) 
is  denoted  by  x*(t)  and  that  the  corresponding 
optimal  final  time  is  r*(a,  b).  The  cost  of  following 
x*  is  denoted  as: 


rto+r* 

rj(a,b,t0)  =  /  g(x*,x*)dt  +  G(x*(t0 +  T*)) 
Jto 

=  min  J(x,  x,  to)  (3) 

x,r 

subject  to  x(t o)  =  a,  x(to  +  T)  =  b. 

Now,  let  x*  (t)  be  the  optimal  trajectory  from  an 
initial  state  a  to  the  constraint  set  Sq  ,  and  let  the 
corresponding  optimal  final  time  from  a  to  Sq  be 
T*Q(a,SQ).  The  cost  of  following  x*  is  denoted  by 


VQ(a,to 


g(x*,x*)dt  +  G(x*(to  +  Tq)) 


=  min  J(x ,  x,  to) 

zTq 


(4) 


subject  to  x(t o)  =  a,Q(x(to  +Pq))  =  0. 

The  cost  of  following  a  generic  trajectory  x(t)  of 

(1)  during  [to,  to  +  o')  is  denoted  by: 


pto+a 

C(x,t0,a)=  /  g{x,  x)dt  +  G(x(t0  +  a))  (5) 
Jt0 


Problem  1.  Find  a  trajectory  x*(t),  a  final  time 
r*  >  0  and  a  final  state  x*(r*)  that  minimize 

rto+r 

J(x,x,to)=  /  g(x,  x)dt  +  G(x(t0  +  T))  (2) 

Jto 

subject  to  the  constraints  x(to)  =  Xq  and  Q(x(to+ 

F))=0. 

Here  it  is  assumed  that  g(x(t),x(t))  >  0,  G(x(t0  + 
r))  >  0  and  that  Q(-)  is  an  algebraic  function  of 
the  state. 


The  following  facts  can  be  derived  easily  from  the 
properties  of  optimal  trajectories  and  are  helpful 
in  the  sequel: 

Fact  2.  Let  rj,r]Q:C  as  defined  in  (3)(4)(5),  and 
Xk(t)  be  a  generic  trajectory  of  (1).  Then,  the 
following  hold: 

(1)  T](a,b,t0)  <  C(xk,t0,  T)  for  any  x+)  with 
Xk(t0 )  =  a,  Xk(t0  +  T)  =  b. 

(2)  r](a,c,to)  <  r](a,b,to)+r](b,c,to+cr),  with  a  = 

r*(o,  b). 

(3)  r]Q(a,t0 )  <  r](a,b,to)  for  any  b  G  Sq. 


Assume  that  there  is  available  an  initial  feasible 
control/trajectory  pair  (ufeas(t),  Xfeas(t))  (but 
sub-optimal)  for  (1),  obtained  through  a  combi¬ 
nation  of  a-priori  knowledge  about  the  problem 
and/or  random  exploration.  Inspired  by  the  idea 
in  (Bruckstein,  1993),  the  agents  are  scheduled  to 
leave  the  initial  state  Xq  sequentially  and  pursue 
one  another  in  a  way  which  will  be  made  precise 
shortly.  The  sequence  is  initiated  with  the  first 
agent  following  x/eas  to  the  set  Sq.  Each  agent 
will  attempt  to  intercept  its  predecessor  -  along 
optimal  trajectories  defined  by  (3)  -  if  the  pre¬ 
decessor  has  not  reached  the  final  state.  If  the 
predecessor  has  already  reached  the  constraint  set 
Sq,  then  the  pursuer  evolves  along  the  optimal 
trajectory  defined  by  (4).  The  precise  rules  that 
govern  the  movement  of  each  agent  are: 

Algorithm  1.  (Continuous  Local  Pursuit):  Iden¬ 
tify  the  starting  state  Xo  on  D  and  the  constraint 
set  Sq.  Let  xq (t)  ( t  £  [0,To])  be  an  initial  trajec¬ 
tory  satisfying  (1)  with  £o(0)  =  Xo ,  Q(xo(To))  = 
0.  Choose  the  pursuit  interval  A  such  that  0  < 
A<T0. 

(1)  For  k  =  1,2,3...,  let  tk  =  fcA  be  the  starting 
time  of  kth  agent.  Let  Uk{t)  =  0 ,xk(t)  =  xq 
for  0  <  t  <tk. 

(2)  For  all  t  >  tk,  calculate  u((t)  for  all  t  £ 
[tk,tk+Tk]  such  that  f(xk(r),  uJ(t))  =  xt(r), 
and  Xt(r)  achieves 

r](xk(t),xk-i{t),t),  if  xk-i (t)  £  Sq 
(t  £  [t,t  +  T*(xk(t),xk-i(t)))) 
VQ(xk(t),t),  if  xk-i (t)  £  Sq 

(t  g  [t,t  +  r*Q(xk(t),sQ)]) 

(3)  Apply  uk(t)  =  m/(0)  to  the  kth  agent. 

(4)  Repeat  from  step  2,  until  the  kth  agent 
reaches  Sq. 

Under  continuous  local  pursuit  (CLP),  each  agent 
continuously  updates  its  own  trajectory  at  every 
t  £  [tk,tk  +  Tk).  It  is  possible  to  alter  the  algo¬ 
rithm  so  that  each  agent  only  performs  a  finite 
number  of  trajectory  optimizations  as  it  evolves 
from  Xq  to  Sq.  The  resulting  “sampled  local  pur¬ 
suit”  algorithm  is  detailed  in  (Shao  and  Hristu- 
Varsakelis,  2004). 

The  (k  —  l)th  agent  is  designated  as  the  “leader” 
and  the  kth  agent  as  the  “follower”  during  pursuit. 
As  Step  2  of  the  algorithm  indicates,  there  are  two 
types  of  followers’  movements,  “free  running”  and 
“catching  up”,  depending  on  whether  the  leader 
has  reached  the  final  constraint  Sq  or  not.  The 
former  type  lets  agents  “learn”  from  their  leaders, 
while  the  “free  running”  stage  enables  them  to 
find  the  optimal  final  state  within  Sq.  Both  stages 
are  essential  for  the  cooperative  solution  to  an 
optimization  problem  with  partially-constrained 
final  state. 


3.  MAIN  RESULTS 

The  CLP  algorithm  defines  an  ordered  sequence  of 
trajectories  {ccfc(t)}.  This  section  will  first  investi¬ 
gate  the  convergence  of  the  trajectories’  cost,  and 
then  will  show  that  the  trajectories  themselves 
converge  to  a  local  optimum.  It  will  be  convenient 
to  distinguish  between  the  planned  trajectories, 
denoted  by  x{t),  that  a  follower  computes  at  every 
point  in  time,  and  the  realized  trajectories,  de¬ 
noted  by  x(t),  which  the  follower  actually  evolves 
along. 

Lemma  1.  Consider  a  leader-follower  pair  and  a 
pursuit  interval  A  defined  in  continuous  local 
pursuit.  Let  the  leader’s  trajectory  be  xk-i  (t)  (t  G 
[tk-i,tk-i  +  Th-i])  and  A  G  [0,Tfc_i).  Suppose 
the  follower  updates  its  trajectory  only  once  during 
[tk,tk  +Tk]  as  described  next: 

•  If  A  <  Xfc_ i  —  A,  the  follower  moves  along 
the  optimal  trajectory  joining  xk  ( tk  +  A)  and 
xk-i(tk+X)  (in  the  sense  of  (3))  with  optimal 
final  time  T  =  T*(xk(tk  +  A), xk-\ (tk  +  A)). 
During  other  times,  the  follower  replicates 
the  leader’s  trajectory,  i.e. 

xk(tj  —  xk~i {t  A)  t  G  [tk,  tk  T  A] 

Xk(t)  =  xk-\(t  —  r)  t  G  [tk  +  X  +  r,  tk  +  Tk\ 

•  If  X  >  Tfc-i  —  A,  the  follower  evolves  along 
the  optimal  trajectory  from  xk(tk  +  A)  to 
the  constraint  set  Sq  (in  the  sense  of  (4))- 
Similarly,  during  other  times 

xk(t)  =  xk-!(t  -  A)  f  G  [tk,tk  +  A] 

Then  the  cost  along  the  follower’s  trajectory  will 
be  no  greater  than  the  leader’s. 

PROOF.  First,  choose  A  <  Tk_\  —  A.  Starting  at 
time  ffc  +  A  and  during  t  G  [ffc  +  A,ffc  +  A-|-r]),  the 
follower  moves  on  the  locally  optimal  trajectory 
xk(t).  The  cost  along  xk  is 

C{xk,tk,Tk) 

=  C(xk,  tk,  A)  +  C(xk,tk  +  A  +  r,Tfc-A-r) 
+v(xk{tk  +  A),  xk-i  (tk  +  A),  tk  +  A) 

<  C(xk-i,tk-i,  A)  +  C(xk- 1,  tk-\  +  A,  A) 

+C(xk- 1,  ffc_i  +  A  +  A,  Tfc_  i  —  A  —  A) 

=  C(ii_i,4_i,rt_i)  (6) 

where  V  =  T*(xk(tk  +  A),a;fc_i(ffc  +  A)).  If  A  > 
Tfc_ i  —  A,  the  cost  along  xk  is 

C{xk,tk,Tk) 

—  C(xk,tk,  A)  T  rjQ ( xk [tk  T  A),f&T  A) 

<  C{xk- 1,  tk~i,  A)  +  C(xk-i,tk-i  +  A,  Tk_ i  —  A) 
C(xk— i ,  tk—x,  Tk— i) 


Therefore  the  cost  along  the  follower’s  trajectory 
is  no  greater  than  the  leader’s.  □ 

Now  the  cost  of  the  iterative  trajectories  can  be 
shown  to  converge  under  CLP: 

Lemma  2.  (Convergence  of  Cost)  If  the  agents  of 
(1)  evolve  under  CLP,  the  cost  of  the  iterated 
trajectories  converges. 

PROOF.  Suppose  the  cost  along  the  leader’s 
trajectory  xk-i(t)  ( t  G  [tk-i,tk-i  +  Tfc-i])  is 
Ck- 1.  Define  a  trajectory  sequence  x\(t)  ( t  G 
[tk,  tk  +  Tk]),i  =  0,1,2...  whose  corresponding 
costs  and  final  times  are  Clk  and  Tj,  as  follows:  let 
xk(t)  =  xk-\ (t)  (i.e.  the  trajectory  of  a  “leader”) 
and  x\  (i  >  0)  is  the  trajectory  of  an  agent 
that  pursues  xlkl  by  performing  only  a  single 
trajectory  update ,  as  described  in  Lemma  1  with 
A  =  (i  —  1)5, 5  >  0  (see  Fig.  1). 


Fig.  1.  Illustration  of  the  trajectory  sequence 
x\(t).  Each  trajectory  is  obtained  by  a  single 
update  upon  its  predecessor. 

According  to  Lemma  1,  the  cost  of  each  follower’s 
trajectory  will  be  no  greater  than  the  leader’s, 
thus 

ci  <  q-1  =►  q°°  <  cl  =  cfc_r 

with  5  >  0. 

Let  S  =  Tk_i/i,  then  S  — >  0  as  i  — >  oo.  At 
the  limit,  the  trajectory  x£°(t)  is  exactly  what 
would  be  obtained  by  an  agent  that  pursues 
another  evolving  along  xk-i,  using  CLP.  Hence 
the  follower’s  cost  is  Ck  =  Cjf3  <  Ck- 1-  Since 
the  sequence  {Ck}  is  non-increasing  and  bounded 
below  (there  exists  a  minimum  for  (2)  ),  it  must 
converge  to  a  limit.  □ 

To  proceed  to  the  main  theorem,  one  must  require 
that  the  optimal  cost  of  (2)  changes  “little”  for 
small  changes  to  the  endpoints  of  a  trajectory: 

Condition  1.  Assume  for  a  generic  trajectory  x\  (t) 
there  exists  an  e  >  0  such  that  for  all  a,  b±,  62  G  D 
and  all  A  >  0,  there  exists  a  trajectory  X2 (t) 
such  that  the  cost  C(xi,0,T)  (xi(0)  =a,X\(T)  = 
bi)  from  a  to  bi  and  cost  C(x2,0,T)  (x2(0)  = 
a,X2(T)  =  62)  from  a  to  &2  satisfy 


\\bi  -  &2II00  <  £ 

=►  \\C{xx,  0,  T)  -  C(x 2,0,  T) iq  <  CA 
for  some  constant  C  independent  of  A. 

Then  the  next  lemma  holds: 

Lemma  3.  Let  x*(t)  be  a  trajectory  of  (1)  such 
that:  i)  x*(t)  (t  G  [0,fi  +  Ai])  is  optimal  (in  the 
sense  of  (3))  from  x*(0)  to  x*(t±  +  Ai),  and  ii) 
x*(t)  (t  G  [ti,T*])  is  optimal  (in  the  sense  of 
(4))  from  x*(ti)  to  the  constraint  set  Sq.  Assume 
Condition  1  is  satisfied  and  0  <  t\  <  ti+Ai  <  T* . 
Then  the  trajectory  x*(t)  ( t  G  [0,T*])  is  a  local 
minimum  of  (4)  from  x*(0)  to  Sq. 

PROOF.  Pick  0  <  A  <  Ai.  From  principle  of 
optimality,  x*(t)  (t  G  [0,ti  +  A])  and  x*(t)  (t  G 
[t\,T*])  are  each  locally  optimal  with  respect  to 
their  corresponding  end  points.  Suppose  ||x*(ti  + 
A)  —  s||oo  >  £\  for  any  s  G  Sq  and  that  x*(t)  ( t  G 
[0,T*])  is  not  a  local  minimum.  There  must  exist 
a  e  <  min(e,£i/2)  (e  is  defined  in  Condition  1) 
and  another  optimum  x(t)  Glx  [0,T]  satisfying 
that  ||x(t)  —  x* (t) || oo  <  e  and  C(x(t),0,T)  < 
C(x*(t),0,T*). 


Fig.  2.  Illustrating  the  proof  of  Lemma  3:  over¬ 
lapping  optimal  trajectories  form  a  locally 
optimal  trajectory. 

Notice  that  ||x(ii  +  A)  —  s||oo  >  e  for  any  s  G  Sq. 
Construct  two  trajectories  y\ (t) ,  (t)  (t  G  [ti,ti  + 

A])  that  connect  x(t)  and  x*(t)  (see  Fig.  2)  and 
satisfy  Condition  1  (with  x*  or  x  playing  the  role 
of  xi,  and  y  1,  2/2  standing  in  for  X2).  In  particular, 
let  yi,  2/2  be  such  that  x*(tf)  =  y2{t\),x*{t\  + 
A)  =  2/1  (*i  +  A),x(H)  =  yi(ti),x\ti  +  A)  = 
2/2 (ti  +  A).  Now  , Condition  1  implies  that 

C(y1(t),t1,A)<C(x(t),t1,A)  +  CA 
C(y2(t),  ti,A)<  C(x* (t),  h,A)  +  CA  (7) 

Because  x*(t)  ( t  G  [0,  t\  +  A])  and  x*(t)  (t  G 
[ti,T*])  are  each  locally  optimal,  the  following 
holds: 

C7(a:*(*),0,t1)  H-  C?(as*  (t) ,  ,  A)  (8) 
<C(x(t),0,ti)  +  C(yi(t),ti,A),  and 


C(x*(t),h,  A)  + 

C,(s*(i),ti  +  A,T*-ti-A)  < 

C(x(t),  t\  +  A,  T  —  ti  —  A)  + 
C(y2(t),t1,A)  (9) 

Combining  (7)  with  (8,9)  leads  to 

C(x*  (£),  0,  T)  <  C(x(t),0,  T)  +  2CA  (10) 

The  cost  C(x(t),0,T)  is  apparently  less  than 
C'(a;*(£),0,T),  but  if  A  is  chosen  so  that 

0  A  C(x*(t),0,T)~C(x(t),0,T) 

2  C 

then  (10)  cannot  hold.  This  is  a  contradiction, 
because  A  could  be  chosen  arbitrarily  small.  It 
follows  that  x*(t)  (t  G  [0,T*])  must  be  a  local 
minimum.  □ 

The  following  lemmas  also  hold.  The  proofs  (both 
by  contradiction)  can  be  found  in  (Shao  and 
Hristu-Varsakelis,  2004)  and  will  not  be  given 
here. 

Lemma  j.  If  for  all  times  during  CLP,  the  lo¬ 
cally  optimal  trajectory  from  follower  to  leader  is 
unique,  then  the  limiting  trajectory  Xoa(t)  exists 
and  is  unique. 

Lemma  5.  Along  the  limiting  trajectory  produced 
under  CLP,  the  planned  trajectories  Xk(t)  and 
realized  trajectories  Xk(t)  overlap,  i.e.  Xk(t)  = 
Xk{t )•  Furthermore,  if  the  locally  optimal  trajecto¬ 
ries  obtained  at  every  updating  time  are  smooth, 
then  the  limiting  trajectory  is  also  smooth. 

The  next  theorem  is  an  immediate  consequence  of 
Lemmas  1  —  5: 

Theorem  1.  Suppose  a  group  of  agents  evolve  un¬ 
der  CLP  and  that  at  every  updating  time  t,  the 
locally  optimal  trajectories  from  follower  to  leader 
are  unique.  Then,  the  limiting  trajectory  obtained 
is  unique  and  locally  optimal.  It  is  also  smooth  if 
the  locally  optimal  trajectories  calculated  at  every 
updating  time  are  smooth. 

PROOF.  From  Lemma  4,  the  limiting  trajectory 
is  unique.  It  follows  that  Xk-i {t  —  A)  =  Xk{t)  if 
Xk~i (£)  =  Xqo (t  —  tk~i).  Choose  61,62  such  that 
0  <  <5i  <  62  <  r  for  all  optimal  final  times  T 
of  planned  trajectories  Xk  generated  during  CLP. 
The  limiting  trajectory  Xoo  is  piecewise  smooth 
and  locally  optimal  for  t  G  [tk  +  i8\,tk  +  i5\  + 
62],  i  =  0,1,2...  because  it  coincides  with  the 
planned  trajectories  x(t).  From  Lemma  3  -  in  this 
case  the  Sq  is  a  single  point  -  it  can  be  concluded 
that  Xk(t)  (£  G  \tk,tk  + 81  +  62])  is  optimal  because 


it  is  the  composition  of  two  overlapping  locally 
optimal  trajectories,  Xk(t)  (t  G  [£*,,£*,  +  $2])  and 
Xk(t)  (£  G  [tk  +  81, tk  +  61  +  <52])-  Successive 
applications  of  this  argument  ( i  =  2,3,...)  lead 
to  the  result  that  Xoo(£)  is  locally  optimal.  The 
smoothness  is  also  proved  “piece  by  piece”.  □ 

CLP  is  a  cooperative,  decentralized  algorithm 
for  learning  optimal  controls/trajectories,  start¬ 
ing  from  a  feasible  solution.  Each  agent  is  only 
required  to  calculate  optimal  trajectories  from  its 
own  state  to  that  of  its  nearby  leader.  Because 
agents  are  separated  by  A  time  units  as  they  leave 
Xq,  each  agent  relies  only  on  local  information 
in  order  to  follow  its  predecessor,  and  requires 
no  knowledge  of  the  global  geometry.  There  is 
no  need  for  agents  to  exchange  or  “fuse”  local 
maps  that  they  obtain  individually.  Agents  do  not 
need  to  communicate  their  choice  of  coordinate 
systems  as  they  evolve,  nor  do  they  need  to  know 
the  coordinates  of  Xf.  While  it  is  possible  that 
a  group  of  agents  could  disperse  and  construct 
a  global  map  from  local  information,  such  an 
approach  might  require  significantly  more  compu¬ 
tation  and  communication  than  CLP.  The  latter 
solves  the  optimal  control  problem  in  many  “short 
pieces”,  which  makes  it  appropriate  for  systems 
with  short-range  sensors  (for  example,  in  the  case 
of  a  swarm  of  robots  exploring  unknown  terrain) , 
and  optimal  control  problems  which  are  easier  to 
solve  over  “short”  distances. 


4.  EXAMPLES 

Consider  the  minimum-time  control  of  the  second- 
order  system  x  =  u,  \u\  <  1,  where  the  cost  to  be 
minimized  is  J(x,x,  0)  =  T,  with  the  boundary 
conditions  *(0)  =  7r,  x{T)  =  i(0)  =  x(T)  =  0. 
Here  the  constraint  set  Sq  is  a  single  point  in 
the  state  space.  It  is  well  known  that  the  optimal 
control  for  this  problem  is  “bang-bang” : 

w.m=  /-1  if  t  G  [0,T/2) 

1 ’  \  1  if  £  G  [T/2,  T] 

Under  CLP,  the  trajectory  of  sixth  agent  was 
optimal.  It  is  interesting  to  note  that  in  this  case, 
optimality  was  achieved  after  a  finite  number  of 
iterations.  Some  of  the  iterated  trajectories  are 
shown  in  Fig.  3. 

A  second  example  demonstrates  the  solution  of  a 
“geodesic  discovery”  problem  on  the  plane.  The 
agents  were  governed  by  x(t)  =  u(t),  x  (£),  u(t)  G 
R2.  The  constraint  set  was  a  circle  with  radius 
10,  centered  at  the  origin.  All  agents  departed 
from  the  point  (30,  30)  and  moved  with  a  constant 
speed  ||:r  1 1  =  1.  The  time  separation  A  between 
agents  was  20  seconds.  Each  agent  followed  a 
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Fig.  3.  Iterated  trajectories  for  minimum  time 
control  problem  through  continuous  local 
pursuit  with  A  =  0.37T,  Tq  =  n.  The  ini¬ 
tial  trajectory  was  produced  using  uo  = 
—  sin(2f). 

straight  line  toward  its  predecessor  before  the 
predecessor  reached  the  circle,  and  moved  on  a 
straight  line  perpendicular  to  the  circle,  once  its 
predecessor  had  reached  the  constraint  set.  As 
illustrated  in  Fig.  4,  the  trajectories  converge  to 
the  optimum. 
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Fig.  4.  Iterated  trajectories  generated  by  con¬ 
tinuous  local  pursuit  for  geodesic  discovery 
problems  on  a  plane  with  constrained  final 
state. 


5.  CONCLUSIONS  AND  ONGOING  WORK 

This  paper  explored  a  biologically-inspired  coop¬ 
erative  strategy  (termed  “Continuous  Local  Pur¬ 
suit”)  for  solving  a  class  of  optimal  control  prob¬ 
lems  with  free  final  time  and  partially-constrained 
final  state.  The  proposed  algorithm  generalizes 
previous  models  that  mimic  the  foraging  behavior 
of  ant  colonies  and  allows  a  collective  to  discover 
optimal  controls,  starting  from  an  initial  subopti- 
mal  solution.  Members  of  the  collective  are  only 


required  to  obtain  local  information  on  their  en¬ 
vironment  and  to  calculate  optimal  trajectories 
to  their  nearby  neighbors.  The  CLP  algorithm 
relies  on  cooperation  to  perform  a  task  which 
would  be  difficult  or  impossible  for  a  single  system 
to  perform,  namely  solving  an  optimal  control 
problem  with  limited  information  (in  terms  of 
coordinate  systems  that  describe  the  environment 
or  the  coordinates  of  the  final  state)  and  short- 
range  sensing. 

There  are  several  natural  extensions  of  this  work, 
including  investigating  the  algorithm’s  conver¬ 
gence  rate,  as  well  as  its  ability  to  lead  to  global 
(as  opposed  to  local)  optimum  by  choice  of  the  al¬ 
gorithm’s  parameters.  It  would  also  be  interesting 
to  investigate  comparisons  between  local  pursuit 
and  established  numerical  methods  for  computing 
optimal  controls. 
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